ci(bench-gpu): harden teardown, cap pairs at 32, fix CUDA comment#736
Merged
Conversation
Review follow-ups on the GPU benchmark workflow: - Teardown: fall back to destroying by the unique RUN_LABEL when no instance id was recorded. The id file is written only after `create` succeeds and its JSON parses, so a box created in that window (concurrency cancel, or a parse failure) could otherwise leak and bill indefinitely. - Cap pairs at 32 (was 40) and round odd requests up to even (the AB/BA design wants even N); raise the job timeout to 210 min so a worst-case 32-pair run (64 proves + slow provisioning + dual CUDA build) fits without timing out after the expensive build. - Fix the CUDARC_PIN comment: the boxes are ~CUDA 12.8 (matching cuda-12080 and the cuda_max_good>=12.8 offer floor), not 13.0; tie it to the MIN_DRIVER guard as the opposite end of the same compatibility window. - Log only the needed fields of create.json instead of the full --raw response, so an unexpected sensitive field can't land in the run log. - Validate the workflow_dispatch branch name before it is interpolated into the remote `bash -lc` command. - Move the run-summary write into an always() step so workflow_dispatch failures are visible in the Actions summary rather than only the raw step log.
JuArce
approved these changes
Jun 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Review follow-ups for #724, targeting its branch so they fold into that PR. Only
.github/workflows/benchmark-gpu.ymlchanges.Correctness / cost
vastai create instancereturns and its JSON parses, but teardown was gated solely on that id file. A box created in that window — a concurrency cancel (cancel-in-progress: true) landing right after create, or a--rawschema change breaking the parse — would exist, bill indefinitely, and teardown would print "nothing to destroy". Teardown now falls back to destroying by the uniqueRUN_LABEL(gpu-bench-<run_id>-<run_attempt>) when no id is recorded, tolerating both possiblevastai show instances --rawshapes and only ever matching this run's own box.Docs
CUDARC_PINcomment. It claimed "the boxes are CUDA 13.0", which contradicts the actual pincuda-12080(CUDA 12.8) and thecuda_max_good>=12.8offer floor. Corrected to ~12.8, and tied to theMIN_DRIVER>=580guard as the opposite (too-old vs too-new) end of the same driver-compatibility window. The pin value itself was already correct — only the explanatory comment was wrong.Hardening (defense-in-depth)
create.json. Replacedcat create.jsonwith ajqprojection of only the needed fields, so an unexpected sensitive field in the--rawresponse can't land in the (collaborator-/world-readable) run log.workflow_dispatchbranch name against the git-ref-safe charset before it is interpolated into the remotebash -lccommand.workflow_dispatchis write-access only, so this isn't a privilege gain — just avoids feeding an unvalidated ref into a remote shell.workflow_dispatchfailures. The run-summary write lived at the end of the bench step, so it was skipped whenever the bench failed (set -e/pipefail) — and dispatch runs have no PR comment step, so a failed dispatch showed nothing but the raw log. Moved it to analways()step that prints the result on success and the log tail on failure.Deliberately left as-is per discussion: template name (
NVIDIA CUDA Lambda VM 64GBis correct), building/running PR code as root on the rented box (bounded — box holds no secrets, destroyed after, collaborator-gated), no global cost cap (cost acceptable given trusted-collaborator gating), andissues: write(matchesbench-abba.yml).Validated with
actionlint(clean) and unit-tested the clamp/round, branch validation, and teardown jq filter logic.